XML Full-Text Search: Challenges and Opportunities
نویسندگان
چکیده
An ever growing number of XML repositories are being made available for search. A lot of activity has been deployed in the past few years to query such repositories. In particular, full-text querying of text-rich XML documents has generated a wealth of issues that are being addressed by both the database (DB) and information retrieval (IR) communities. The DB community has traditionally focused on developing query languages and efficient evaluation algorithms for highly structured data. In contrast, the IR community has focused on searching unstructured data, and has developed various techniques for ranking query results and evaluating their effectiveness. Fortunately, recent trends in DB and IR research demonstrate a growing interest in adopting IR techniques in DBs and vice versa [1, 2, 3, 4, 5, 6, 7, 9]. In the past 5 years, the W3C has been putting a lot of effort in designing the XQuery 1.0 and XPath 2.0 languages that provide powerful primitives to navigate in XML documents. Many database researchers and practitioners have influenced the design of these languages and have been developing XQuery prototypes. On the other hand, in IR, INEX, the INitiative for the Evaluation of XML [8] has been created 3 years ago to put together XML documents to assess scoring and ranking methods for XML that accounts for document structure, in the same manner as TREC was designed for keyword retrieval. Several prototypes participate to INEX each year and the basic query language used within this effort is very similar to XPath. The goal of this proposal is to provide a survey on existing research in XML full-text search in DB and IR including languages, appropriate scoring and ranking methods, implementation architectures and query evaluation algorithms and, summarize open research issues such as the joint optimization of queries on both structure and content. We believe that this tutorial is necessary to drive the atten-
منابع مشابه
Full Text Search in XML Documents
The goal of this paper is to show how XML structure information can be used for full text search in XML documents. Existing products for full text search are investigated regarding their support of XML. The main aspect of this investigation is how the search scope of queries is specified and narrowed by taking advantage of the XML format. Considering the results of this investigation, a suggest...
متن کاملPersonalizing XML Text Search in PimenT
A growing number of text-rich XML repositories are being made available. As a result, more efforts have been deployed to provide XML fulltext search that combines querying structure with complex conditions on text ranging from simple keyword search to sophisticated proximity search composed with stemming and thesaurus. However, one of the key challenges in full-text search is to match users’ ex...
متن کاملA Method for Evaluating Full-text Search Queries in Native XML Databases
In this paper we consider the problem of efficiently producing results for full-text keyword search queries over XML documents. We describe full-text search query semantics and propose a method for efficient evaluation of keyword search queries with these semantics suitable for native XML databases. Method uses inverted file index which may be efficiently updated when a part of some XML documen...
متن کاملFull-Text and Structural Indexing of XML Documents on B+-Tree
XML query processing is one of the most active areas of database research. Although the main focus of past research has been the processing of structural XML queries, there are growing demands for a fulltext search for XML documents. In this paper, we propose XICS (XML Indices for Content and Structural search), which aims at high-speed processing of both full-text and structural queries in XML...
متن کاملFull-Text and Structural XML Indexing on B+-Tree
XML query processing is one of the most active areas of database research. Although the main focus of past research has been the processing of structural XML queries, there are growing demands for a full-text search for XML documents. In this paper, we propose XICS (XML Indices for Content and Structural search), novel indices built on a B-tree, for the fast processing of queries that involve s...
متن کامل